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Abstract.  Interdiction  operations  involving  search,  identification,  and  interception  of  suspected 
objects  are  of  great  interest  and  high  operational  importance  to  military  and  naval  forces  as  well 
as  nation’s  coast  guards  and  border  patrols.  The  interdiction  scenario  discussed  in  this  paper 
includes  an  area  of  interest  with  multiple  neutral  and  hostile  objects  moving  through  this  area,  and 
an  interdiction  force,  consisting  of  an  airborne  sensor  and  an  intercepting  surface  vessel  or  ground 
vehicle,  whose  objectives  are  to  search,  identify,  track,  and  intercept  hostile  objects  within  a 
given  time  frame.  The  main  contributions  of  this  paper  are  addressing  both  airborne  sensor  and 
surface  vessel  simultaneously,  developing  a  stochastic  dynamic-programming  model  for 
optimizing  their  employment,  and  deriving  operational  insight.  In  addition,  the  search  and 
identification  process  of  the  airborne  sensor  addresses  both  physical  (appearance)  and  behavioral 
(movement  pattern)  signatures  of  a  potentially  hostile  object.  As  the  model  is  computationally 
intractable  for  real-world  scenarios,  we  propose  a  simple  heuristic  policy,  which  is  shown,  using  a 
bounding  technique,  to  be  quite  effective.  Based  on  a  numerical  case  study  of  maritime 
interdiction  operations,  which  includes  several  representative  scenarios,  we  show  that  the 
expected  number  of  intercepted  hostile  objects,  following  the  heuristic  decision  policy,  is  at  least 
60%  of  the  number  of  hostile  objects  intercepted  following  an  optimal  decision  policy. 
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Area  of  Review:  Military  and  Homeland  Security 

1.  Introduction 

Interdiction  operations  involving  search,  identification,  and  interception  of  suspected 
objects  are  of  great  interest  and  high  operational  importance  to  military  and  naval  forces 
as  well  as  nation’s  coast  guards  and  border  patrols  [1].  There  are  two  key  assets  in 
interdiction  operations  that  we  consider  in  this  paper:  an  airborne  sensor,  for  example,  a 
patrol  (fixed-wing)  aircraft,  a  helicopter,  or  an  unmanned  aerial  vehicle  (UAV),  whose 
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mission  is  to  search,  detect,  track,  and  identify  potential  targets,  and  a  surface  vessel  or 
ground  vehicle,  which  is  dispatched  following  a  cue  from  the  sensor  to  investigate  and 
potentially  apprehended  a  suspicious  object.  This  study  is  motivated  by  current 
operational  needs  in  maritime  counter-terrorism,  counter-drug,  and  counter-piracy 
missions.  In  such  targeted  and  focused  missions  only  a  single  airborne  asset  and  a  single 
surface  vessel  may  operate  in  a  certain  part  of  a  region  of  interest  [2].  In  this  paper,  we 
develop  a  stochastic  dynamic-programming  model  for  optimizing  the  combined  operation 
of  these  two  assets.  In  principle,  the  model  is  solvable  by  the  Backward  Dynamic 
Programming  Algorithm  (see  for  example  [3],  p.  50),  but  in  real-world  scenarios  that 
approach  may  not  be  computationally  feasible  due  to  the  model  size.  Consequently,  we 
develop  a  greedy  heuristic  algorithm  that  can  be  used  in  real-time  to  effectively  deploy 
and  employ  the  two  assets.  We  verify  the  quality  of  the  heuristic  by  constructing  a 
relaxation  of  the  model  and  showing  that  for  some  realistic  scenarios  the  heuristic 
generates  solutions  that  are  at  most  40%  from  optimality. 

The  field  of  classical  search  theory,  addressing  the  problem  of  optimal  search  for  static  or 
mobile  targets,  has  been  extensively  studied  for  over  seven  decades,  since  the 
groundbreaking  research  of  Koopman  [4],  through  the  seminal  works  of  Washburn  [5] 
and  Stone  [6],  to  the  recent  surge  in  publications;  for  example  see  [7-19].  The  problem  of 
coordinating  search  and  interception — the  topic  of  this  paper — is  more  involved.  Wein 
and  Atkinson  [20]  study  a  radiation  detection  system,  combined  with  interception  efforts, 
for  protecting  an  urban  area  from  nuclear  terrorist  attack.  Jeffcoat  et  al.  [21]  deal  with 
searching  and  engaging  multiple  targets  where  each  search  or  engagement  asset  can 
engage  at  most  one  target.  Barton  et  al.  [22]  consider  a  team  of  UAVs  comprising  two 
groups:  searchers  that  use  dynamic  co-fields  to  avoid  obstacles,  and  disposable  UAVs 
that  are  called  in,  when  targets  are  found,  to  kill  the  targets;  see  also  [23]  for  a  related 
study.  The  balance  between  search  for  unknown  targets  and  interception  of  known 
targets  represents  a  classical  exploration  versus  exploitation  trade-off  [24],  which  is 
known  to  be  difficult  to  carry  out  optimally.  We  refer  to  [25]  for  a  recent  study  of 
algorithms  and  complexity  results  and  [26]  for  heuristics.  A  related  study  is  also  [27], 
which  deals  with  the  placement  of  stationary  perimeter  cameras  while  accounting  for 
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interceptions  by  an  unmanned  helicopter  following  detections  by  the  cameras.  We  refer 
to  [28]  for  a  study  of  object  identification  without  the  need  for  search  and  interception. 

In  contrast  to  many  of  the  above  studies,  which  mostly  focus  on  technical  and  command- 
and-control  aspects  of  employing  a  large  number  of  search  and  interception  assets,  we 
take  an  operational  approach,  which  reflects  typical  current  situations  in  maritime 
missions,  where  interdiction  assets  are  scarce  [2].  We  account  for  possible  identification 
errors,  consider  both  the  physical  signature  of  a  suspicious  object  and  its  movement 
pattern,  and  optimize  routing  and  scheduling  decisions  taken  by  a  task-force  commander. 
The  measure  of  perfonnance  is  the  expected  number  of  targets  successfully  interdicted. 
The  main  contribution  of  this  paper  is  threefold:  We  model  the  combined  effect  of  the 
“eye  and  the  fist,”  incorporate  infonnation  about  physical  signature  and  movement 
pattern  of  suspicious  objects,  and  derive  operational  insight  about  when  to  trigger 
investigation  by  the  surface  vessel.  In  an  earlier  study  [29]  we  deal  with  a  similar 
situation.  However,  that  study  does  not  consider  tracking  of  suspicious  objects, 
information  about  movement  patterns  of  objects,  and  lacks  the  analytical  rigor  and  the 
solution-quality  bounds  for  the  proposed  heuristic  algorithm  presented  in  the  current 
paper.  Our  modeling  approach  is  similar  to  that  found  in  the  extensive  literature  on 
stochastic  and  dynamic  task  allocation  and  vehicle  routing  (e.g.,  [3,30]  and  references 
therein),  but  is  specialized  to  the  unique  features  of  interdiction  operations. 

The  next  section  defines  the  operational  scenario.  Section  3  presents  the  stochastic 
dynamic-programming  model.  Section  4  describes  a  heuristic  algorithm  for  solving  the 
model  as  well  as  an  associated  model  that  is  used  to  construct  a  bound  on  the  optimal 
value  of  the  original  model.  Section  5  presents  a  numerical  case  study  for  maritime 
interdiction  missions. 

2,  Scenario 

We  consider  an  area  of  interest  ( AOI)  that  contains  multiple  mobile  objects.  Some  of  the 
objects  are  hostile,  called  targets,  and  the  remaining  are  neutrals.  The  objective  of  the 
interdiction  force  is  to  intercept  as  many  targets  as  possible  within  a  finite  time  horizon 
discretized  into  time  periods.  The  number  of  objects,  which  enter,  move  about,  and 
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(eventually)  exit  the  AOI  is  unknown.  The  AOI  is  subdivided  into  a  finite  number  of  area 
cells  ( ACs ).  The  objects  are  oblivious  to  the  presence  of  the  interdiction  force  and 
therefore  they  do  not  act  strategically;  they  move  independently  of  each  other  according 
to  a  known  Markov  chain  defined  on  the  set  of  ACs.  The  movements  of  targets  and 
neutrals  may  follow  different  Markov  chains.  An  object  enters  and  departs  the  AOI 
according  to  a  Bernoulli  process.  We  assume  stationarity  in  the  sense  that  neither  the 
entry  probabilities  nor  the  in-AOI  transition  or  exit  probabilities  depend  on  the  time 
period.  Motivated  by  our  discretization  of  space  and  time,  with  resolution  that  can  be 
arbitrarily  high,  and  assuming  that  the  AOI  is  relatively  large  compared  to  the  (unknown) 
number  of  objects,  we  neglect  the  possibility  of  more  than  one  object  in  any  specific  AC 
at  any  given  time  period.  This  is  a  reasonable  approximation  to  the  situation  in  open-sea 
scenarios  and  it  simplifies  the  model.  A  similar  assumption  is  made  in  [31].  The 
interdiction  force  comprises  two  assets:  an  airborne  sensor,  called  a  Recognizer,  whose 
mission  is  to  search,  detect,  track,  and  identify  targets,  and  a  ground  vehicle  or  surface 
vessel,  called  an  Interceptor,  capable  of  intercepting  and  apprehending  a  target.  Figure  1 
presents  an  example  of  such  an  AOI. 


Figure  1 .  An  example  of  an  AOI  with  multiple  objects  -  neutrals  (N)  and  targets  ( T ), 

a  Recognizer  ( R )  and  an  Interceptor  (7). 
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We  assume  that  the  Recognizer  has  perfect  detection  capabilities,  i.e.,  it  can  detennine 
with  certainty  whether  the  AC,  in  which  it  is  currently  located,  contains  an  object.  This  is 
a  reasonable  assumption  as  radars  usually  detect  objects  such  as  fishing  vessels  and  go- 
fast  boats  at  a  substantial  range.  The  Recognizer  examines  one  AC  at  a  time  until  it 
detects  an  object.  Following  detection,  the  Recognizer  tracks  the  object  for  one  time 
period  and  then  determines  the  nature  of  the  object  using  a  threshold  policy  described  in 
Section  3.  The  Recognizer  is  subject  to  both  false  positive  and  false  negative  errors  when 
identifying  an  object.  The  modeling  of  the  tracking  process  is  based  on  a  series  of 
“looks”,  as  described  in  Section  3.2.  For  more  details  on  tracking  see  [32,33].  If  the 
object  is  identified  as  a  neutral,  the  Recognizer  proceeds  with  its  search.  Otherwise,  the 
Recognizer  flags  the  suspected  target  and  calls  in  the  Interceptor.  We  do  not  describe  in 
detail  the  “pursuer-evader  game”  (see  for  example  [34-36])  that  may  take  place  after  an 
object  is  flagged  and  make  the  simplifying  assumption  that  once  flagged,  the  object 
remains  stationary  at  its  location  until  the  arrival  of  the  Interceptor.  The  Recognizer 
remains  with  the  object  until  the  Interceptor  arrives  and  completes  the  interception,  at 
which  time  the  Recognizer  returns  to  its  search.  Any  object  that  is  tracked  by  the 
Recognizer  is  tagged  (e.g.,  electronically)  as  “examined”  and  is  of  no  further  interest. 

The  Interceptor  has  perfect  identification  capability;  it  can  distinguish  with  certainty 
between  a  target  and  a  neutral.  When  not  involved  in  an  intercepting  mission,  the 
Interceptor  moves  according  to  a  given  deterministic  policy.  For  simplicity  of  exposition, 
we  throughout  the  paper  assume  that  the  policy  is  to  remain  stationary.  Thus,  the 
Interceptor  is  stationary  at  the  location  of  its  last  interception  (or  initial  deployment 
absent  interceptions),  waiting  for  calls  by  the  Recognizer.  However,  other  policies  can 
trivially  be  incorporated  in  the  model.  The  goal  of  the  interdiction  force  is  to  maximize 
the  expected  total  time-discounted  number  of  intercepted  targets  during  the  time  horizon. 
Figure  2  summarizes  the  operational  setting,  where  reward  is  a  time-discounted  value 
collected  for  each  intercepted  target. 


5 


3! 

o' 

7T 

fD 

g 

CD 

Q_ 


Figure  2.  The  interdiction  scenario 


3.  Model  Development 

The  dynamic  program  in  this  paper  is  constructed  based  on  conventions  of  [3],  pp.  129— 
178.  We  first  present  the  main  components  of  the  model  and  then  discuss  the  technical 
details  of  probability  updates,  state  transitions,  and  probability  distributions. 

3.1  Main  Components  of  the  Model 

Let  A  and  Alt  denote  the  set  of  ACs  in  the  AOI  and  the  area  outside  the  AOI, 
respectively.  Let  t  =  \,2,—,T  denote  the  (discrete)  time  index.  While  we  could  have 
formulated  the  dynamic  program  in  the  classical  manner  with  a  possible  decision  at  each 
time  period,  we  choose  to  adopt  a  somewhat  unconventional  approach  that  is  event- 
rather  than  time-driven.  The  reason  is  that  the  situation  we  consider  involves  substantial 
blocks  of  time  periods  during  which  no  decisions  are  required.  Specifically,  while  the 
Recognizer  is  travelling  to  an  AC,  or  tracking  an  object,  or  waiting  for  the  Interceptor,  no 
decisions  are  expected  to  be  made.  We  utilize  this  special  situation  and  develop  an  event- 
driven  formulation  where  decisions  are  only  made  at  random  time  periods  when  certain 
events  occur.  This  construct  is  described  in  detail  below.  Our  approach  results  in  a  state 
space  of  smaller  cardinality,  which  we  utilize  computationally  in  Sections  4  and  5.  Thus, 
we  define  a  state  as  a  vector  s  =  {t,r,i,7r,0),  where  r  g  A  and  ieA  are  the 
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Recognizer’s  and  Interceptor’s  locations  at  time  t,  respectively,  and  n  (0)  is  a  vector  of 
probabilities  with  components  na  (9a  ) ,  a  e  A  ,  with  na  ( Ga )  being  the  probability  that  a 

neutral  (target)  is  present  in  AC  a  at  time  period  t.  Let  <S  ci 

|l,2,...,r}x.4x>Ax[0,l]^  x[0,l]^  be  the  space  of  all  possible  state  vectors.  The 
inclusion  of  S  in  the  right-hand  side  is  strict  because  the  probabilities  na  and  6a  may 
only  take  on  a  finite  number  of  values  in  a  given  problem  instance  due  to  the  finite 
number  of  detection  and  interception  opportunities  within  the  finite  time  horizon.  Hence, 
the  state-space  S  is  of  finite,  but  extremely  high  cardinality. 

A  decision  x  e  A  determines  the  next  AC  to  be  visited  by  the  Recognizer;  this  decision 
is  made  either  at  t  =  0  or  when  the  existing  decision  is  fathomed.  A  decision  x  e  A  is 
said  to  be  fathomed  in  one  of  the  following  three  situations:  (i)  no  object  is  found  by  the 
Recognizer  in  AC  x  ,  (ii)  an  object  is  found  in  AC  x  but  identified  as  a  neutral,  or  (iii)  an 
object  is  found  in  AC  x ,  identified  as  a  target,  intercepted,  and  detennined  to  be  either  a 
target  or  a  neutral.  As  soon  as  a  decision  is  fathomed,  a  new  decision  is  made.  Each  new 
decision  constitutes  a  stage  in  the  detection-interception  process. 

Let  w  =  (Atw,rw,iw,zwj  denote  the  vector  of  random  variables  representing  the 
information  available  when  a  decision  is  fathomed.  The  first  component  A tw  is  the 
duration  of  a  stage  (i.e.,  the  time  between  when  a  decision  is  made  and  when  it  is 
fathomed),  and  the  variables  rw  and  iw  denote  the  Recognizer’s  and  Interceptor’s 

locations  at  the  end  of  a  stage,  respectively.  The  Bernoulli  random  variable  zw  equals  1  if 

the  stage  ends  with  a  target  interception  and  0  otherwise.  Let  kV  denote  the  space  of 
possible  realizations  of  w .  The  probability  distribution  of  w ,  which  depends  on  the  state 
5  and  the  decision  x,  is  derived  in  Section  3.3.  To  simplify  notation  we  do  not 
distinguish  between  the  random  vector  w  and  its  realization.  The  meaning  should  be 
clear  from  the  context. 

The  next  state  is  determined  by  the  state-transition  function  sM  :  S  x  A  x  W  — »  S  ,  which 
depends  on  the  current  state,  the  decision,  and  the  infonnation  obtained  when  the 
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decision  is  fathomed;  see  Section  3.3  for  details.  The  reward  associated  with  state 
5  =  (t, r, i, n, 6)  and  the  following  realization  w  =  (A tw,rw,iw, zw )  is  given  by 


c(w,s) 


*„•(! +rftst’\ 

o, 


t  +  A tw  <  T 
t  +  A  tw  >  T . 


(1) 


The  reward  is  0  if  no  target  is  intercepted  or  if  the  time  of  interception  is  beyond  the  time 
horizon,  and  is  a  discounted  value  otherwise.  Note  that  the  reward  only  depends  on  the 
decision  x  through  the  probability  distribution  of  w. 


The  Bellman  equation  for  state  s  =  ( t,r,i,n,9 )  takes  the  form 


V(s) 


max  E 

X 

c[w,s^  +  V  [sM  (s,x, 

o, 

t  <T 
t>T 


(2) 


where  V(s)  is  the  value  of  being  in  state  s,  and  the  expectation  is  with  respect  to  the 
probability  distribution  of  w  (see  Section  3.3).  The  stochastic  dynamic-programming 
model  in  (2)  is  denoted  by  SDP,  and  the  corresponding  optimal  policy  is  referred  to  as  the 
SDP  policy. 


3.2  Probability  Updates 

Let  P(a',a)  denote  the  single  time-period  transition  probability  from  AC  a '  to  AC  a  of 
a  neutral,  and  let  P  =  [P(V, ,  a', a  e  A  u  Aq  ,  be  the  corresponding  matrix.  Similarly, 
we  define  (?  =  [(?(V,tf)]  for  a  target.  Let  aa  and  (5a  denote  the  single  time-period 

arrival  probabilities  of  a  neural  and  a  target,  respectively,  to  AC  a.  Absent  the  interdiction 
force,  let  ( <9° )  be  the  steady-state  probability  of  a  neutral  (target)  in  AC  a  e  A  .  In 
view  of  our  assumptions, 

oo 

leA  k=\ 

where  Pk  (l,a)  is  the  (I,  a)  entry  of  Pk ,  the  transition  matrix  P  raised  to  the  k,h  power. 
Similarly,  for  targets  we  obtain  that 

oo 

leA  k= 1 
8 


(4) 


In  the  presence  of  an  interdiction  force,  these  probabilities  may  be  updated  as  described 
in  Section  3.3.  Let  n'a  and  0'a  denote  the  updated  probabilities  of  a  neutral  and  a  target  in 

AC  a  at  time  t,  respectively.  Given  n'a  and  6'r  aei,  and  no  updates  during  (t,t'],  t'>t, 
we  have  that 


=  1 (a’a'))  Y\Y\(l-aapk 


a&A 


t-t'-l 


\a&A  k= 1 


(5) 


where  the  second  product  in  (5)  is  equal  to  1  if  t'-t  =  1 .  Similarly,  for  a  target, 


€  =  1  ■ -  (1  ■ -  Pa'  )  n  t1  -  W"  («.«’))  T[  fl  (l -  PaQk  («> «  ')) 


(6) 


Suppose  that  the  Recognizer  visits  AC  a  at  time  t  and  let  n'fet  and  0'f’e'  denote  the 
updated  probabilities  following  that  visit.  If  the  AC  is  void  of  objects  then 
n'f*  =  d[fa  =  0.  Otherwise, 


n 


t,Det 

a 


r\t,Det 

Ua 


el 


n+el 


1  t. 

=  1  —  71  „ 


(7) 

(8) 


Following  a  detection  of  an  object,  the  Recognizer  tracks  the  object  for  one  time  period 
and  utilizes  two  modes  of  recognition:  signature  recognition  (e.g.,  using  an  electro- 
optical  sensor)  and  movement  recognition,  in  which  the  Recognizer  tries  to  identify  the 
movement  pattern  of  the  tracked  object  (i.e.,  leaving  known  shipping  lanes  or  any  other 
suspicious  movement).  The  movement  recognition  relates  to  the  extensive  literature  on 
anomaly  detection;  see,  e.g.,  [37-39].  Without  loss  of  generality,  we  assume  that 
signature  recognition  takes  place  first  and  the  Recognizer  takes  g  looks  (glimpses)  at  the 
tracked  object.  The  glimpses  are  conditionally  independent  given  the  presence  of  the 
object  in  that  AC.  Let  1  -  u  and  1  -  v  denote  the  single  glimpse  false  negative  probability 
of  identifying  a  target  as  a  neutral,  and  the  false  positive  probability  of  identifying  a 
neutral  as  a  target,  respectively.  Suppose  that  n  glimpses  result  in  “neutral”  cues,  g-n 
glimpses  result  in  “target”  cues,  and  the  object  moves  from  AC  a  to  AC  j  (if  the  objects 
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leaves  the  AOI,  the  decision  is  fathomed).  Let  7rt+l,Sig  denote  the  signature-posterior 


probability  of  a  neutral  following  the  g  glimpses,  where 

/(l-v) 


n 


f+l„Sig  _ 


8~n  „t,Det 

K.. 


'(i  -yf 


g~n  nt.Det 


:(i- -«)"(»)""" ». 


(9) 


and,  similarly,  the  signature-posterior  probability  of  a  target  is 


0 


t+\,Sig  _ 


v) 

(i-M)"(u) 

,8-"  Qt,Det 

1  A 

(  > 

w 

v"(l-v) 

8~ 

n  ~-t,Det  . 

+ 

f  \ 

8 

|(1  -u)n  (u)g~a  O'f* 

=  \  —  n 


t+\,Sig 


J 


(10) 


Finally,  observing  that  the  object  has  moved  from  AC  a  to  AC  j ,  the  movement 
recognition  mode  takes  the  posteriors  of  the  signature  recognition  mode  as  priors  and  we 
obtain 


K  ■ 


P(a,j) 


n 


t+l,Sig 

j 


P(a,j)7Tt;1’Sig  +Q(a,j)0 


\t+l,Sig 

j 


(ll) 


for  a  neutral,  and 


p(ajy;'-s“  +Q(a,j)0’;‘-Sl‘ 


(12) 


J  ^  )  J 

for  a  target.  If  (12)  exceeds  a  predetennined  threshold  M  ,  then  the  object  is  considered 
to  be  a  suspected  target  and  the  interceptor  is  called  in. 


3.3  State  Transitions 


Given  the  state  5  =  {t,r,i,7r,0)  at  the  beginning  of  a  decision  stage,  the  decision  x,  and 
the  realization  of  the  information  vector  w  =  (At  ,r  A  ,  z  ) ,  the  state-transition  function 

\  w J  w J  w J  W/ 

is 

sM  (s,x,w)  =  (t  +  A tw , rw , iw ,nM ,0M)  (13) 

where  nM  and  0M  are  the  probability  vectors  n  and  6  of  the  next  state,  prior  to  making 
the  next  decision.  There  are  three  time  intervals  (cases)  we  potentially  need  to  account  for 
when  computing  nM  and  6M  .  First,  the  time  between  making  the  decision  x  and  the 
Recognizer’s  arrival  at  x ,  second,  the  tracking  and  identification  time  of  the  detected 
object  (a  single  time  period),  and  third,  the  waiting  time  for  the  Interceptor  to  arrive  and 
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complete  the  interception.  Figure  3  summarizes  the  three  different  cases  that  may  occur, 
where  TRx  is  the  time  required  by  the  Recognizer  to  move  from  AC  r  to  AC  x. 

In  Case  1,  there  is  no  object  in  AC  x  and  therefore  7ix  =  0  and,  for  a^x,  n™  is  given 
by  (5)  with  t'  replaced  by  t  +  TRx .  Similarly,  0x  =  0  and,  for  a  ^  x,  is  given  by  (6) 
with  t '  replaced  by  t  +  TRx .  In  Case  2,  we  first  compute  n]'  and  as  described  for 
Case  1  and  denote  these  values  by  n™'temp  and  Q^Jemp .  Then,  we  update  these  values  to 
account  for  the  single  time  period  tracking  and  set  n™  =0  if  a  is  the  AC  into  which  the 
tracked  object’s  has  transited  at  time  t+ 1.  Otherwise,  we  set  n™  as  given  by  (5)  with  t' 
and  n'a  replaced  by  t  + 1  and  nx'temp ,  respectively.  Similar  computation  applies  to  6x  . 


Making  a  decision  Arrival  to  AC  X  ,  decision  is  fathomed, 
end  of  stage,  new  state 


— 1 - 

- 1 - 

- ► 

t 

t  +  TRx  =  t  +  At 

w 

Case  1:  No  object  detected  in  ACX 

Making  a  decision 

Arrival  to  AC  X  , 

End  of  tracking,  decision  is 

, 

object  detected 

fathomed,  end  of  stage,  new  state 

t 

+ 

*  ^ 

t  +  +  1  -  1  +  Atw 

Case  2:  Object  detected  in  AC  X ,  but  it  is  not  flagged  as  a  likely  target 

Making  a  decision 

Arrival  to  AC  X  , 

End  of  tracking  Interception,  decision  is  fathomed, 

1 

object  detected 

end  of  stage,  new  state 

- 1 - 1 - ► 

Figure  3.  Timeline  of  state  transitions 


Case  3  is  computed  by  applying  the  computations  of  Case  2  repeatedly,  until  the 
Interceptor  arrives  at  the  AC  of  the  object,  the  interception  is  completed,  and  the  stage  is 
over  (i.e.,  decision  is  fathomed). 
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3.4  Probability  Distribution  of  the  Information  Vector  tv 

The  final  piece  in  formulating  SDP  is  the  probability  mass  function  of  the  infonnation 
vector  w=(Atw,rw,iw,zw') .  Recall  that  w  describes  the  consequences  of  a  decision  to 

visit  a  certain  AC  x :  the  time  until  the  decision  is  fathomed,  the  locations  of  the 
Recognizer  and  Interceptor  when  this  happens,  and  whether  a  target  has  been  intercepted. 
Since  our  setting  is  discrete  so  is  also  w.  Let  T'.  denote  the  time  it  takes  the  Interceptor  to 

travel  from  AC  i  to  AC  j  and  to  complete  the  processing  of  a  suspected  target  in  j.  We 
assume  that  this  time  is  fixed  and  given. 

We  consider  five  different  and  exhaustive  events  that  may  occur  given  state 
s  =  (t,r,i,n,6)  and  decision  x: 

(i)  no  object  is  detected  in  AC  x  ,  which  results  in  w  =  (TRx,x,i,0) ; 

(ii)  an  object  is  present  in  AC  x  and  it  exits  the  AOI  while  being  tracked,  i.e., 
w  =  {TrRx+\,x,i,0y, 

(iii)  an  object  is  present  in  AC  x ,  it  moves  to  AC  j  e  A  and  is  identified  by  the 
Recognizer  as  a  neutral,  i.e.,  w  =  ( TRX  + 1,  j, i,  0  ) ; 

(iv)  an  object  is  present  in  AC  x,  it  moves  to  AC  j  e  A,  is  identified  by  the 
Recognizer  as  a  target,  and  when  intercepted  is  confirmed  as  a  neutral,  i.e., 

w  =  {t*x  + 1  +  T/j ,  j,  j,  o) ; 

(v)  as  event  (iv)  but  when  intercepted  the  object  is  confinned  as  a  target,  i.e., 

HC+1+A.w.i). 


We  need  the  following  notation.  Let 

- 1,  if  there  is  no  obj  ect  in  AC  x 

d  =  \  0,  if  there  is  an  object  in  AC  x  and  while  being  tracked  it  exits  A 

j ,  if  there  is  an  object  in  AC  x  and  while  being  tracked  it  moves  to  AC  j 


(14) 
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if  there  is  an  object  in  AC  x  and,  following  tracking,  it  is  not  identified  as  a  target 
if  there  is  an  object  in  AC  x  that  is  identified  as  a  target. 


/  = 


0, 

1, 


Note  that  /  =  0  can  either  imply  that  the  tracked  object  is  identified  by  the  Recognizer  as 
a  neutral,  or  that  the  object  has  left  the  AOI.  Let  nx  denote  the  probability  given  by  (5) 

n  ^ 

when  t '  is  replaced  by  t  +  T.  x  and  0X  denote  the  probability  given  by  (6)  when  t '  is 
replaced  by  t  +  TRx .  We  next  consider  each  of  the  five  events  in  turn. 

Event  (i)  is  equivalent  to  {  d  =  -1}  and,  hence, 

Pr{(i)}  =  Pr  {c/  =  -1}  =  \  —  nx  —  Gx  (16) 

Event  (ii)  is  equivalent  to  {  d  =  0}  and,  hence, 

Pr{(ii)}  =  Pr{<7  -0}  =  P(x,A^)nx  + Q(x,A0)6x.  (17) 

To  compute  the  probabilities  of  the  other  three  events,  let  dR“  denote  the  probability 
that,  following  tracking,  the  Recognizer  identities  the  object  as  a  target;  see  (12).  Recall 
that  a  tracked  object  is  identified  as  target  if  0Rec  >  M ,  where  M  is  a  given  probability 
threshold.  With  a  slight  abuse  of  notation,  let  {x  =  target}  and  {x  =  neutral}  denote  the 
events  that  AC  x  contains  a  target  and  a  neutral,  respectively,  at  the  time  when  the 
Recognizer  arrives  at  AC  x.  We  defer  the  calculation  of  event  (iii)  and  next  compute  the 
probability  of  event  (iv). 

For  any  j  e  A  , 

Pr  {(iv)}  =  Pr  {/  =  1,  d  -  j,x-  neutral} 

=  Pr  [d  =  j, x  =  neutral}  Pr  {/  =  1 1  d  -  j,x-  neutral} 

=  Pr  jc/  =  j,x- neutral} Vr^0Rec  >M\d-  j, x  -  neutralj  ^ 

g 

=  Pr  [d  -  j,x  =  neutral}  ^Pr{f?fec  >M  \  d  -  j,x  =  neutral,  n  =  H'jPrjn  =  n'  \  d  -  j,  x  -  neutral } 

n'= 0 

where  g  is  the  given  total  number  of  glimpses  the  Recognizer  takes  while  tracking  the 
object,  and  «<gis  the  number  of  glimpses  that  returned  “neutral”  cues.  Note  that  for 
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every  j  s  A ,  we  can  calculate  the  maximum  value  of  n  for  which  0Rec  >  M  .  Let  n  . 


denote  this  value.  Thus, 


[l,  ifri<n] 

Pr  (  dRec  >M\d-  j,n  =  n'\  =  \ 

1  J  0,  if  n!  >  n. 


Hence,  in  view  of  (18), 


Pr{c/  =  y',x  =  neutral}^Pr|0&c  >M  \  d  =  j,x  =  neutral,  n  =  «'}Pr{« -n  \  d  -  j,x  =  neutral} 

"'=°  t  (20) 

tlj 

-  Pr  [d  -  j,  x  =  neutral}  Pr  {«  =  n  j  d  -  j, x  -  neutral} 

n'= 0 

Using  Bayes’  rule  for  the  first  multiplicative  term  on  the  right-hand-side  of  (20),  we 


obtain  that 


Pr  {(iv)}  =  Pr  {d  =  j,x  =  neutral}  Pr  {«  =  n  \  d  =  j,x  =  neutral } 

n'=0 

ti, 

=  Pr{t/  =  /  |  v  =  neutral }  Pr  \x  -  neutral  j  Pi-  {«  =  n  \  tl  =  j,x  =  neutral} 


Following  a  similar  derivation,  we  obtain  for  event  (v)  that 

Pr{(v)}  =  Pr{/  =  l,d  =  j,x  =  target}  =Q(xj)0^[^\j(\-u)n  ( u)s j.  (22) 

n'-O 

Finally,  for  event  (iii)  we  follow  the  derivation  in  events  (iv)  and  (v)  and  obtain  that  for 

jeA, 

Pr  {(iii)}  =  Pr  {/  =  0,  =  y}  = 

t  ((.*.)  v(i -*r')+Q(*M  t  (t)(i-“f  <23) 

n'=nj+l  n'=nj+ 1 

3.5  Computation  of  Bellman’s  Equation 

Given  a  state  s  =  (t,r,i,n,0)  and  information  w  =  (Atw,rw,iw,zw) ,  we  see  from  (1)  that 
c(w,.S’)  is  only  a  function  of  t,  A tw,  and  zw.  Hence,  for  the  computation  of 
E[c(w,s)\  =  X  c(w',s)Vr{w  =  w'}  we  only  need  the  joint  probability  distribution  of 
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A tw  ,  and  zw.  Similarly,  sM  (.S’,  vv)  is  only  a  function  of  A tw  ,  rw,  and  iw ;  see  (13).  Hence, 
we  only  need  the  joint  probability  distribution  of  these  three  random  variables  for  the 
calculation  of  is[V(sM  =  ^H/eVV^(5M  (s,x,w))Pr{>v  =  w'} .  The  detailed 

derivation  of  Bellman’s  equation  is  given  in  Appendix  A. 

The  resulting  size  of  SDP  is  large;  the  number  of  different  paths  the  Recognizer  can  take 
during  the  time  horizon  T  is  no  larger  than  |A|r ,  and  therefore  the  number  of  different 

values  of  k  and  9  is  no  larger  than  \A^ .  Hence,  the  state  space  size  is 

|<S|  =  T  -|A|  -|v4|  -|A|r  =  T  -|A|r+A  The  size  of  the  infonnation  space  is  |W|  =  3  -|,4|  +  2  . 

While  in  principle  a  SDP  policy  can  be  detennined  using  the  Backward  Dynamic 
Programming  Algorithm  (see  for  example  [3],  p.  50),  most  situations  result  in  a  model 
that  renders  that  algorithm  impractical  due  to  its  exponential  run  time  complexity  of 

o[t  •  |-4.|7+3  •  (3  •  \A\  +  2)  j .  Thus,  we  consider  a  heuristic  algorithm. 

4.  Heuristic  Algorithm  and  Model  Relaxation 

In  this  section  we  develop  a  simple  greedy  heuristic  for  solving  SDP  and  examine  its 
effectiveness  using  a  relaxation. 

4.1  Heuristic  Algorithm 

For  any  states  =  ( t,r,i,x,0 ) ,  we  define  the  heuristic  policy 

x//(s)eargmax|  ^  1  (24) 

^  {Tr,a+l  +  Ti,a} 

where  the  numerator  ( 9a )  is  the  probability  of  a  target  in  AC  a  at  the  time  the 

Recognizer  reaches  AC  a  computed  by  (6),  and  the  denominator  is  the  approximated 
total  time  to  interception.  This  is  a  greedy  policy  that  balances  the  likelihood  of  a  target 
in  a  certain  AC  and  the  “cost”  in  time  that  such  a  visit  would  incur.  In  somewhat 
different,  but  related,  search  situations  similar  greedy  policies  are  proven  to  be  optimal 
(see  e.g.,  [6],  [14]). 
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4.2  Model  Relaxation 


The  heuristic  policy  obviously  results  in  a  lower  bound  on  the  optimal  value  of  SDP.  To 
assess  the  quality  of  that  heuristics,  we  define  a  relaxation  of  SDP,  denoted  by  rSDP, 
which  provides  an  upper  bound  for  the  SDP  policy.  The  relaxation  rSDP  is  similar  to 
SDP.  A  decision  x  is  identically  the  same,  and  the  information  is  similar,  but  its 
probability  distribution  is  different.  The  state  transition  functions  of  the  two  models  are 
closely  related  and  the  rewards  are  practically  the  same,  except  that  the  functional 
notation  is  different  because  the  two  models  use  different  state  spaces.  Lastly,  the 
Bellman  equations  of  the  two  models  are  almost  identical,  except  that  they  use  slightly 
different  variables. 

The  main  difference  between  SDP  and  rSDP  is  that  the  state  space  in  the  latter  becomes 
considerably  smaller  by  eliminating  the  two  probability  vectors  n  and  6 .  Each  time  a 
decision  is  fathomed,  we  “reset”  the  two  probability  vectors  n  and  6  to  their  initial, 
steady-state  values  at  time  t  =  0  and  therefore  these  two  vectors  need  not  be  part  of  the 
state  vector.  In  other  words,  the  Recognizer  is  memory-less.  By  not  nullifying  the 
probabilities  in  an  AC  following  a  visit  (see  Sections  3.2  and  3.3),  rSDP  assigns  each 
ACs  a  probability  of  containing  a  target  no  smaller  than  the  corresponding  probability  in 
SDP.  Hence,  rSDP  is  a  relaxation  of  SDP.  Having  this  memory-less  property,  there  is  a 
risk  that  rSDP  will  generate  a  policy  that  “traps”  the  Recognizer  in  an  AC  that  has  a 
relatively  high  probability  of  a  target.  To  avoid  these  traps  in  rSDP,  we  temporarily  drop 
the  probability  of  an  object  in  the  Recognizer’s  AC  down  to  0.  This  temporary  update 
only  holds  until  the  current  decision  is  fathomed.  Once  we  complete  the  current  state 
transition,  we  ignore  this  temporary  update  and  reset  to  the  steady-state  probabilities.  We 
next  define  rSDP  precisely,  where  bars  are  used  to  denote  parameters  and  variables 
related  to  rSDP. 

We  define  a  state  in  rSDP  by 

s=(T,r,i)  (25) 

where  are  the  time,  Recognizer’s  location,  and  Interceptor’s  location, 

respectively.  The  state  space  is  denoted  by  S  .  As  in  SDP,  a  decision  x  e  A  is  selecting 
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the  next  AC  to  be  visited  by  the  Recognizer.  Let  the  random  vector  w  =  (Atw,rw,iw,zw ) 

denote  the  infonnation  obtained  when  a  decision  is  fathomed  in  rSDP.  The  definitions  of 
the  components  of  w  and  its  space  of  possible  values  are  exactly  the  same  as  in  SDP,  but 
the  probability  mass  function  is  different. 


The  state  transition  function  JM  :  <S  x  W  — >  S  in  rSDP  differs  from  that  in  SDP  because 
the  decision  x  is  not  included  explicitly  as  an  argument  of  the  function  but  only 
implicitly  by  affecting  the  probability  mass  function  of  w .  We  define 

sM  (j,w)  =  (J  +  ATw,7w,iw)  (26) 

where  J  =  (t  ,7,  i )  is  the  state  and  w  =  (ATw,rw,  iw,zw)  is  the  obtained  information.  The 


reward  c:Wx5 
defined  by 


which  is  a  function  of  the  information  w  and  the  state  s  ,  is 


z-v-(1+r) 

o. 


-(r+A^) 


c{w,s)  =  - 

The  value  V  (j  )  is  given  by  the  Bellman  equation 

r(»)=- 


t  +  <  T 

t  +  A  £ ,  >  T 


(27) 


max  If? 

c(w,s)  +  V(sM  (s,w))  J 

0, 

t  <T 
T>T 


(28) 


Computing  V  (5  )  for  rSDP  is  similar  to  computing  V  (5)  in  SDP.  The  only  difference  is 


that  the  updated  probabilities  9  and  n  are  replaced  by 

0"  = 


=  ■ 


a  *  r 

(29) 

0, 

a  =F 

7T°  , 

a  ^7 

a  ‘ 

(30) 

0, 

a  =  7 

where  G{)  and  /r°  are  the  steady-state  probabilities;  see  (3)  and  (4).  The  derivation  of 
Bellman  equation  for  rSDP  is  given  in  Appendix  B.  The  state  space  in  rSDP  has 

cardinality  |<S  |  =  T  •  |«4.|“  and  the  run  time  of  the  backward  dynamic  programming 
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algorithm  is  o(V -|A|3 -(3-|^4|  +  2)j,  which  is  much  faster  than  that  for  SDP.  Hence, 
solving  rSDP  may  be  possible  in  reasonable  time. 


5.  Model  Implementation 

We  consider  a  maritime  interdiction  mission  in  an  AOI  comprising  25  ACs  and  a  time 
horizon  of  48  time  steps.  We  also  briefly  consider  a  situation  with  64  ACs.  The  relaxation 
rSDP  is  in  these  situations  a  tractable  dynamic  program  and  is  optimally  solved  using  the 
Backward  Dynamic  Programming  Algorithm  (see  for  example  [3],  p.  50).  Direct 
calculation  of  the  value  of  the  heuristic  policy  is  impractical  and  we  estimate  it  by  Monte- 
Carlo  simulation.  All  models  and  algorithms  were  implemented  and  analyzed  using 
MATLAB  on  a  MacBook  Pro  with  Dual-Core  2.53GHz  CPU  and  4GB  of  RAM. 

5,1  Scenario  Data 

We  are  unable  to  present  results  for  actual  interdiction  missions  due  to  security 
constraints  on  operational  data.  However,  we  generate  realistic  scenarios  based  on 
unclassified  information  we  obtained  from  active-duty  naval  officers  who  have 
operational  experience  with  counter-drug  operations  [2].  The  analysis  comprises  a  base 
scenario,  and  several  variations  thereof.  The  baseline  scenario  represents  a  strait-like 
AOI,  with  land  on  the  North  and  South  edges  of  the  AOI  (i.e.,  no  arrivals  from  or 
departures  to  the  North  and  South  of  the  AOI).  The  AOI  is  a  square  grid  comprising  25 
ACs,  each  of  size  5nm  x  5nm.  (see  Figure  4).  The  time  horizon  is  12  hours,  divided  into 
48  time  steps  of  15  minutes  each.  Arrivals  are  only  possible  to  ACs  1-10,  that  is 
aa- Pa-  0  for  a  =11,. ..,25.  We  assume  that  aa  =  .05,  (5a  =  .01  for  a  =  l,...,10.  The 
transition  probabilities  of  neutrals  (P)  and  targets  (Q)  are  different,  representing  different 
movement  patterns.  In  a  single  time  period,  an  object  can  only  move  to  one  of  the  four 
immediate  neighboring  ACs,  or  remain  in  the  current  AC.  We  assume  that  neutrals  tend 
to  move  along  the  strait  (West-East  traffic),  while  targets  tend  to  move  perpendicular  to 
the  shipping  lanes  (North-South  traffic). 
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For  any  object  the  probability  to  stay  in  its  AC  during  a  time-step  is  0. 1  and  the  transition 
probability  East  (North)  is  equal  to  the  transition  probability  West  (South).  For  neutrals 
these  probabilities  are  0.3  East  and  0.15  North,  while  for  targets  these  probabilities  are 
reversed.  Objects  exiting  the  AOI  do  not  return. 

In  the  base  scenario  both  the  Recognizer  and  the  Interceptor  start  in  AC  18.  We  assume 
that  the  Interceptor  has  roughly  the  same  velocity  as  both  the  neutrals  and  targets,  which 
is  one  AC  per  time  period  (approximately  20  knots  in  real-life).  The  Recognizer  velocity 
is  assumed  to  be  four  times  the  velocity  of  the  Interceptor.  The  Recognizer’s  and 
Interceptor’s  transition  times  between  ACs  include  the  travel  time  and  processing  time 

(detection  time  for  the  Recognizer  and  boarding  time  for  the  Interceptor). 

◄ -  25nm  - ► 
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Figure  4.  The  baseline  scenario  AOI 

The  Recognizer’s  sensor  takes  three  glimpses  at  a  tracked  object  (  g  =  3  ).  The  false 
positive  and  false  negative  detection  probabilities  of  a  target  are  0.2  (u  =  v  =  0.8).  The 
discount  factor  is  y  =  0.05 ,  which  means  that  the  reward  obtained  from  a  target 
intercepted  at  the  end  of  the  12  hours  time  horizon  is  approximately  Xo  °f  the  reward 
obtained  at  t  =  0 .  The  value  of  the  probability  threshold  M  for  calling  in  the  Interceptor 
is  varied  to  examine  its  effects  on  the  results. 

With  the  given  hardware  and  software,  rSDP  is  solved  in  approximately  30  minutes  and 
estimating  the  expected  total  reward  under  the  heuristic  policy,  using  Monte  Carlo 
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simulation  and  stopping  when  the  95%  confidence  interval  has  width  less  than  5%  of  its 
center,  needs  about  6  minutes. 

In  addition  to  the  base  scenario,  we  also  considered  scenarios  with  zero-discounting, 
longer  transition  time  for  the  Interceptor,  96-hour  time  horizon,  and  an  1600nm“  AOI. 

5.2  Numerical  Results 

We  first  examine  the  performance  of  the  heuristic  policy  described  in  Section  4.1.  Recall 
that  the  heuristic  and  rSDP  policies  provide  lower  and  upper  bounds,  respectively,  for  the 
optimal  expected  reward  of  SDP.  Table  1  and  Figure  5  present  the  expected  reward  for 
both  policies  in  the  baseline  scenario,  using  various  threshold  values  of  M.  The  error  bars 
in  Figure  5  (and  later  in  Figures  6  and  7)  represent  95%  confidence  intervals  of  the 
estimated  expected  reward  following  the  heuristic  policy.  The  average  gap  between  the 
two  expected  rewards  is  about  30%,  with  relatively  little  sensitivity  to  the  choice  of  M  . 
This  means  that  the  heuristic  policy  results  in  an  expected  reward  that  is  at  least  70%  of 
the  optimal  value  in  these  situations. 


Probability 
threshold  M 

Upper  Bound 
expected  reward 

Heuristic 
expected  reward 

%  gap 

0 

0.72 

0.50 

30.9 

0.01 

0.75 

0.52 

30.5 

0.05 

0.76 

0.54 

29.7 

0.1 

0.77 

0.52 

32.2 

0.15 

0.77 

0.53 

30.8 

0.25 

0.74 

0.51 

31.0 

0.35 

0.74 

0.51 

31.4 

0.5 

0.68 

0.47 

30.8 

0.75 

0.63 

0.41 

34.3 

0.9 

0.44 

0.30 

32.2 

Table  1 .  Expected  rewards  for  Heuristic  and  Upper  Bounding  (rSDP)  policies  in 
Baseline  scenario  for  various  probability  threshold  values  M 
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Figure  5.  Graphical  representation  of  Table  1 
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Probability 

Upper  Bound 

Heuristic 

%  gap 

threshold  M 

expected  reward 

expected  reward 

0 

1.92 

1.21 

37.1 

0.01 

2.02 

1.22 

39.4 

0.05 

2.10 

1.28 

39.1 

0.1 

2.10 

1.27 

39.2 

0.15 

2.10 

1.23 

41.1 

0.25 

2.03 

1.20 

40.6 

0.35 

2.03 

1.23 

39.4 

0.5 

1.81 

1.14 

37.0 

0.75 

1.72 

0.90 

47.8 

Table  2.  Expected  rewards  for  Heuristic  and  Upper  Bounding  (rSDP)  policies  in  a 
no-discounting  scenario  for  various  probability  threshold  values  M 


Table  2  and  Figure  6  represent  the  same  results  for  the  case  with  a  discount  factor  of  zero. 
In  this  case  the  gap  is  slightly  larger  than  in  the  baseline  scenario,  with  an  average  gap  of 
about  40%.  The  shapes  of  the  graphs  in  Figures  5  and  6  are  similar.  The  slightly  better 
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performance  of  the  heuristics  when  discounting  time  may  be  explained  by  the  greater 
focus  on  near-term  rewards,  rather  than  long-term,  in  SDP  in  that  case. 

From  the  baseline  scenario  (Table  1  and  Figure  5),  we  observe  that  the  expected  reward  is 
monotonically  decreasing  in  the  probability  threshold  M  for  M  >  0.05 .  In  other  words, 
larger  thresholds  (than  0.05)  result  in  worse  performance  of  the  interdiction  force.  This 
observation  appears  to  be  counter  intuitive,  as  one  would  expect  a  larger  threshold  to  be 
more  efficient  so  that  the  Interceptor  and  the  Recognizer  do  not  waste  time  dealing  with 
unlikely  targets.  In  order  to  better  understand  these  counter  intuitive  result,  we  evaluated 
two  additional  scenarios  with  longer  interception  times  that  are  results  of  a  longer  on¬ 
board  inspection  time  (“boarding  time”).  Table  3  and  Figure  7  compare  the  results  of 
three  interception  times:  (1)  base  scenario,  (2)  base  scenario  +  5  time  periods,  (3),  base 
scenario  +  20  time  periods. 


Probability 
threshold  M 

Boarding  time  =  0  periods 

Boarding  time  =  5  periods 

Boarding  time  =  20  periods 

Upper 

Bound 

expected 

reward 

Heuristic 

expected 

reward 

%  gap 

Upper 

Bound 

expected 

reward 

Heuristic 

expected 

reward 

%  gap 

Upper 

Bound 

expected 

reward 

Heuristic 

expected 

reward 

%  gap 

0 

0.72 

0.50 

30.9 

0.35 

0.29 

17.3 

0.09 

0.09 

3.5 

0.01 

0.75 

0.52 

30.5 

0.39 

X 

X 

0.10 

X 

X 

0.05 

0.76 

0.54 

29.7 

0.42 

0.34 

19.7 

0.12 

0.11 

12.1 

0.1 

0.77 

0.52 

32.2 

0.45 

X 

X 

X 

X 

X 

0.15 

0.77 

0.53 

30.8 

0.45 

0.36 

21.1 

X 

0.12 

X 

0.25 

0.74 

0.51 

31.0 

0.46 

X 

X 

0.14 

X 

X 

0.35 

0.74 

0.51 

31.4 

0.46 

0.35 

23.3 

0.14 

0.13 

10.3 

0.5 

0.68 

0.47 

30.8 

0.43 

0.33 

23.2 

0.14 

0.12 

13.8 

0.75 

0.63 

0.41 

34.3 

0.41 

0.30 

25.5 

X 

X 

X 

0.9 

0.44 

0.30 

32.2 

0.30 

0.22 

26.6 

0.11 

0.09 

18.5 

Table  3.  Sensitivity  of  expected  reward  for  Heuristic  and  Upper  Bounding 
(rSDP)  policies  to  boarding  time,  (x  marks  scenarios  which  have  not  been 

calculated) 
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Figure  7.  Graphical  representation  of  Table  3 

A  threshold  value  of  approximately  0.2  appears  to  be  the  best  threshold  in  the  scenario 
with  boarding  time  of  five  time  periods,  while  a  value  of  approximately  0.4  is  the  best 
threshold  in  the  scenario  with  boarding  times  of  20  time  periods.  In  any  case,  the 
threshold  M  is  relatively  small.  This  result  is  consistent  with  common  practice  in  which 
even  the  slightest  suspicion  triggers  investigation.  In  a  sparsely  populated  environment, 
such  as  the  one  modeled  in  this  analysis,  it  is  “better  to  be  safe  than  sorry,”  even  at  the 
expense  of  many  false  alarms. 

Finally,  we  investigate  the  heuristic’s  performance  for  a  longer  time  horizon  and  larger 
AOI,  where  all  other  parameters  remain  the  same  as  in  the  base  scenario.  For  a  24  hour 
scenario  (7’  =  96  time  periods),  the  heuristic’s  expected  reward  is  approximately  0.57 
(with  95%  confidence  interval  of  width  less  than  0.03)  and  that  of  rSDP  is  0.85,  with  a 
gap  of  33%,  which  is  similar  to  the  gap  in  the  shorter  scenario.  For  a  1600nm  AOI,  with 
8nm-by-8nm  ACs.  The  heuristic  expected  reward  is  approximately  0.45  (with  95% 
confidence  interval  of  width  0.02),  while  that  of  rSDP  is  0.62,  with  a  gap  of  28%,  which 
is  also  in  agreement  with  previously  presented  results. 
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6.  Conclusions 


We  developed  a  stochastic  dynamic-programming  model  for  a  combined  search  and 
interdiction  operation.  The  operation  comprises  an  airborne  sensor  for  detection, 
identification,  and  tracking  of  suspected  objects  and  a  surface  vessel  or  ground  vehicle 
for  subsequent  interception.  While  the  model  is  rich  and  reflects  real-world  military  and 
naval  operations,  it  is  also  intractable  by  standard  algorithms.  Thus,  we  developed  a 
greedy  heuristic  policy,  which  results  in  a  lower  bound  on  the  optimal  expected  number 
of  successful  interdictions  within  the  planning  horizon,  and  a  relaxation  of  the  model, 
which  generates  an  upper  bound.  We  show  that  for  certain  realistic  maritime  interdiction 
scenarios  the  gap  between  the  two  bounds  is  in  the  range  of  30%  -  40%.  The  study 
provides  the  operational  insight  that  the  threshold  for  triggering  investigation  by  the 
surface  vessel  is  quite  low.  For  realistic  situations  examined  in  this  paper,  a  target 
(posterior)  probability  as  low  as  0.1  after  tracking  and  identification  by  the  airborne 
sensor  should  result  in  interception  of  the  potential  target  by  the  surface  vessel. 
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Appendix  A 


This  Appendix  provides  details  about  the  calculations  of  Bellman’s  equation  in  SDP;  see 
Section  3.5.  For  notational  convenience,  we  define  for  any  xe  A,  s  =  {t,r,i,n,6),  and 

w  =  {K’rwdW’zw)’ 

c(Atw,zw,s)  =  c(w,s)  (31) 

sM  (s,x,Atw,rw,iw)  =  sM  (s,x,  w)  (32) 

Using  these  functions,  we  find  that 

1  oo 

E[c(w’s)\  =  e[s(aC,zw,s)\  =  Tj  H£(K ,<>5)pr{AC  =K ,zw  =  z'J  (33) 

zw= 0  Afv=0 

and 


.n  <34> 


e\v[sm  (.s,x,w))] 

=  E  [^F  ( ”  (s,x,Atn,rn,iw)yj  =  ^  ^  V^sM  (s,x,At'w,P,i'w  =  A  t'w,rw  =  r  ’ ,  iw  =  i'f 

At'w  =0  r’w&A  i'w&A 

where 

PrjAt  =A f  ,z  =  z'j=Y  VPrjAt  =A f  ,r  =r',i  =i' ,z  =  z' )  (35) 

(  w  W  ’  W  W )  /  -/  /  J  l  w  w  w’  w  w  ’  w  w ) 


PrlAi  =  At' , 

(  w  w ? 


r  =  r  a  -i 

w  w  ’  w 


C}  =  2>K  =  AC 


r  —r  a  —i  ,  z  =  z 

W  W  W  ■  w  w 


4  (36) 


z'=  0 


Using  (36)  and  the  probability  mass  function  of  w,  we  find  that 
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jsA 


j&A 


where 

{l)  =  v(sM  (s,x,TrRx,x,i)} 

(//)  =  (  \-nx-9x) 

(7//)  =  F(r(.,x,7;>l  +  ^.,y,y)) 


fg](1_M)”  uS~nQ(xJ)0x+\ 

R 

v"’(i_vr_fl  p(x’j)xx 

[\n ) 

A  J 

j 

(r)  =  r(^(j,ac,7;fx+l,y,i)) 

(F/)=  Z  uS~"Q{x’j)0x+\  &,\v”  i}-v)g~n  P{x,j)n 

(VIl)  =  v(sM  (s,x,TrRx  +\,x,i)) 

(yiii)  =  Q(x,A)e*+P(x,  AR 

Similarly,  we  can  use  (35)  and  the  probability  mass  function  of  w  to  compute 
£[c(w,s)]  = 

(i+rrK'*'y0(.v,7)£(j)(i-«r>r' 


n'=0 


(37) 


(38) 


Appendix  B 

In  this  Appendix  we  provide  details  about  the  calculations  of  Bellman’s  equation  for 
rSDP.  Let  n-  denote  the  probability  given  by  (5)  when  t'  and  nla  are  replaced  by  t  +  TRx 

and  tt''  ,  respectively.  Moreover,  we  let  0-  denote  the  probability  given  by  (6)  when  t ' 

and  6la  are  replaced  by  t  +  TRx  and  0°.  Substituting  0  and  n  with  (29)  and  (30)  in  (37) 

and  (38),  while  explicitly  computing  the  next  state  using  the  state  transition  function  in 
(26),  we  get  the  following  formulas  for  computing  the  Bellman  equation  for  rSDP: 

£[F(^=(L  +  AL,F,Tj)]  =  (/)(//)  +  X(W(7L))  +  Z((0(^))  +  (^)(ra/) 

jeA  jsA 
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where: 

{!)-v{jM  =(T  +  Ttr-,x,J)) 

(//)  =  (  l-#r-4 


{III)  =  v(sm  ^(T+T^  +  l  +  T'jJ,])) 

( IV) =  Z((*)(1_M)"  uS~n'Q(*J)^+[ey^-vYn  p{*’j)x-x 

(V)  =  V(sM  =(T  +  T«x  +  l,j,7)) 

(VI) =  i  ([*)(1_“)"  M*""’2(*»^)4+(*)v’'( i-v)8-"  P{x,j)i, 

.n'=n*+l  ' 

(F//)  =  F(^(T  +  7-  +l,*j)) 

(F///)=(e(x,A)4+^A); 


£[c(w,s)]  = 

(i+y)+*®*1<)G(^7)ifc)(1-“)"'(“r' 


=4Z 

jeA 


n'-O 


(39) 


(40) 


The  state  space  in  rSDP  has  cardinality  |<S |  =  T  -\Af  and  the  run  time  of  the  backward 

dynamic  programming  algorithm  is  o[t  ■  |*4|''  -(3  ■  |y4  +  2)j ,  which  is  much  faster  than 
that  for  SDP. 
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